2 research outputs found
Improving Multiple Object Tracking with Optical Flow and Edge Preprocessing
In this paper, we present a new method for detecting road users in an urban
environment which leads to an improvement in multiple object tracking. Our
method takes as an input a foreground image and improves the object detection
and segmentation. This new image can be used as an input to trackers that use
foreground blobs from background subtraction. The first step is to create
foreground images for all the frames in an urban video. Then, starting from the
original blobs of the foreground image, we merge the blobs that are close to
one another and that have similar optical flow. The next step is extracting the
edges of the different objects to detect multiple objects that might be very
close (and be merged in the same blob) and to adjust the size of the original
blobs. At the same time, we use the optical flow to detect occlusion of objects
that are moving in opposite directions. Finally, we make a decision on which
information we keep in order to construct a new foreground image with blobs
that can be used for tracking. The system is validated on four videos of an
urban traffic dataset. Our method improves the recall and precision metrics for
the object detection task compared to the vanilla background subtraction method
and improves the CLEAR MOT metrics in the tracking tasks for most videos
Apprentissage profond pour vision stéréoscopique multispectrale
RÉSUMÉ: Ce mémoire présente des méthodes pour estimer les disparités des humains, soit le déplacement entre les pixels des silhouettes humaines, entre des images visibles (RGB) et infrarouges (LWIR). Le but est que, pour chaque pixel dans l’image de gauche, on soit capable de trouver le pixel correspondant dans l’image de droite. Ceci permet de mettre en correspondance les objets d’intérêts d’une scène et peut être utile dans des applications de vidéosurveillance ou de voitures autonomes. Différents facteurs rendent cette tâche plutôt difficile. En plus des difficultés reliées à la nature stéréo du problème, il y a aussi la difficulté de travailler avec deux spectres différents qui n’ont pas beaucoup d’information en commun. Ceci cause beaucoup de problèmes lorsqu’il est temps d’établir des correspondances entre les images. Les méthodes de la littérature se basent sur des descripteurs classiques, mais nous croyons qu’il est possible d’obtenir des méthodes plus performantes si on utilise des réseaux de neurones convolutifs.----------ABSTRACT:This thesis presents new methods to do disparity estimation for human subjects, defined as the distance between pixels on the human silhouettes, between images from the visible (RGB) and infrared domains (LWIR). The goal of disparity estimation is, for each pixel in the left image, to find the corresponding pixel in the right image. This allows the correspondence of objects of interest, which can be useful in applications such as video surveillance and autonomous vehicles. Many factors make this task difficult. It has difficulties related to the stereo aspect of the problem, as well as having to establish correspondences between images from different domains, which is hard since there is not much common information between those. Methods in the literature are based on handcrafted feature descriptors, but we believe that it is possible to obtain better methods if we use convolutional neural networks